Pandas SeabornΒΆ

In this notebook we'll look at interfacing between the composability and ability to generate complex visualizations that HoloViews provides, the power of pandas library dataframes for manipulating tabular data, and the great looking statistical plots and analyses provided by the Seaborn library.

We also explore how a pandas DFrame can be wrapped in a general purpose Element type, which can either be used to convert the data into other standard Element types or be visualized directly using a wide array of Seaborn-based plotting options, including:

This tutorial assumes you're already familiar with some of the core concepts of HoloViews, which are explained in the other Tutorials.

This tutorial requires NumPy, Pandas, and Seaborn to be installed and imported:

In [1]:
import itertools

import numpy as np
import pandas as pd
import seaborn as sb

np.random.seed(9221999)

import holoviews
from holoviews import *

We can now select static and animation backends:

In [2]:
%load_ext holoviews.ipython
%output holomap='widgets' fig='svg'

Visualizing Distributions of Data

If import seaborn succeeds, HoloViews will provide a number of additional Element types, including Distribution, Bivariate, TimeSeries, Regression, and DFrame (a Seaborn-visualizable version of the DFrame Element class provided when only pandas is available).

We'll start by generating a number of Distribution Elements containing normal distributions with different means and standard deviations and overlaying them. Using the %%opts magic you can specify specific plot and style options as usual; here we deactivate the default histogram and shade the kernel density estimate:

In [3]:
%%opts Distribution (hist=False kde_kws=dict(shade=True))
d1 = 25 * np.random.randn(500) + 450
d2 = 45 * np.random.randn(500) + 540
d3 = 55 * np.random.randn(500) + 590
Distribution(d1, label='Blue') *\
Distribution(d2, label='Red') *\
Distribution(d3, label='Yellow')
Out[3]:

Thanks to Seaborn you can choose to plot your distribution as histograms, kernel density estimates, or rug plots:

In [4]:
%%opts Distribution (rug=True kde_kws={'color':'indianred','linestyle':'--'})
Distribution(np.random.randn(10), key_dimensions=['Activity'])
Out[4]:

We can also visualize the same data with Bivariate distributions:

In [5]:
%%opts Bivariate.A (shade=True cmap='Blues') Bivariate.B (shade=True cmap='Reds') Bivariate.C (shade=True cmap='Greens')
Bivariate(np.array([d1, d2]).T, group='A') +\
Bivariate(np.array([d1, d3]).T, group='B') +\
Bivariate(np.array([d2, d3]).T, group='C')
Out[5]:

This plot type also has the option of enabling a joint plot with marginal distribution along each axis, and the kind option lets you control whether to visualize the distribution as a scatter, reg, resid, kde or hex plot:

In [6]:
%%opts Bivariate [joint=True] (kind='kde' cmap='Blues')
Bivariate(np.array([d1, d2]).T, group='A')
Out[6]:

Bivariate plots also support overlaying and animations, so let's generate some two dimensional normally distributed data with varying mean and standard deviation.

Working with TimeSeries data

Next let's take a look at the TimeSeries View type, which allows you to visualize statistical time-series data. TimeSeries data can take the form of a number of observations of some dependent variable at multiple timepoints. By controlling the plot and style option the data can be visualized in a number of ways, including confidence intervals, error bars, traces or scatter points.

Let's begin by defining a function to generate sine wave time courses with varying phase and noise levels.

In [7]:
def sine_wave(n_x, obs_err_sd=1.5, tp_err_sd=.3, phase=0):
    x = np.linspace(0+phase, (n_x - 1) / 2+phase, n_x)
    y = np.sin(x) + np.random.normal(0, obs_err_sd) + np.random.normal(0, tp_err_sd, n_x)
    return y

Now we can create HoloMaps of sine and cosine curves with varying levels of observational and independent error.

In [8]:
sine_stack = holoviews.HoloMap(key_dimensions=['Observation error','Random error'])
cos_stack = holoviews.HoloMap(key_dimensions=['Observation error', 'Random error'])
for oe, te in itertools.product(np.linspace(0.5,2,4), np.linspace(0.5,2,4)):
    sines = np.array([sine_wave(31, oe, te) for _ in range(20)])
    sine_stack[(oe, te)] = TimeSeries(sines, label='Sine', group='Activity',
                                      key_dimensions=['Time', 'Observation'])
    cosines = np.array([sine_wave(31, oe, te, phase=np.pi) for _ in range(20)])
    cos_stack[(oe, te)]  = TimeSeries(cosines, group='Activity',label='Cosine', 
                                      key_dimensions=['Time', 'Observation'])

First let's visualize the sine stack with a confidence interval:

In [9]:
%%opts TimeSeries [apply_databounds=True] (ci=95 color='indianred')
sine_stack
Out[9]:
Observation_error:

Random_error:

And the cosine stack with error bars:

In [10]:
%%opts TimeSeries (err_style='ci_bars')
cos_stack.last
Out[10]:

Since the %%opts cell magic has applied the style to each object individually, we can now overlay the two with different visualization styles in the same plot:

In [11]:
cos_stack.last * sine_stack.last
Out[11]:

Let's apply the databounds across the HoloMap again and visualize all the observations as unit points:

In [12]:
%%opts TimeSeries (err_style='unit_points')
sine_stack * cos_stack
Out[12]:
Observation_error:

Random_error:

Working with pandas DataFrames

In order to make this a little more interesting, we can use some of the real-world datasets provid3ed with the Seaborn library. The holoviews DFrame object can be used to wrap the Seaborn-generated pandas dataframes like this:

In [13]:
iris = DFrame(sb.load_dataset("iris"))
tips = DFrame(sb.load_dataset("tips"))
titanic = DFrame(sb.load_dataset("titanic"))

By default the DFrame simply inherits the column names of the data frames and converts them into Dimensions. This works very well as a default, but if you wish to override it, you can either supply an explicit list of key_dimensions to the DFrame object or a dimensions dictionary, which maps from the column name to the appropriate Dimension object. In this case, we define a Month Dimension, which defines the ordering of months:

In [14]:
flights_data = sb.load_dataset('flights')
dimensions = {'month': Dimension('Month', values=list(flights_data.month[0:12])),
              'passengers': Dimension('Passengers', type=int),
              'year': Dimension('Year', type=int)}
flights = DFrame(flights_data, dimensions=dimensions)
In [15]:
%output fig='png' dpi=100 size=150

Flight passenger data

Now we can easily use the conversion methods on the DFrame object to create HoloViews Elements, e.g. a Seaborn-based TimeSeries Element and a HoloViews standard HeatMap:

In [16]:
%%opts TimeSeries (err_style='unit_traces' err_palette='husl') HeatMap [xrotation=30]
flights.timeseries(['Year', 'Month'], 'Passengers', label='Airline', group='Passengers') +\
flights.heatmap(['Year', 'Month'], 'Passengers', label='Airline', group='Passengers')
Traceback (most recent call last):
  File "/var/lib/buildbot/slaves/holoviews_docs/build/holoviews/ipython/display_hooks.py", line 186, in wrapped
    **kwargs)
  File "/var/lib/buildbot/slaves/holoviews_docs/build/holoviews/ipython/display_hooks.py", line 262, in layout_display
    fig = layoutplot()
  File "/var/lib/buildbot/slaves/holoviews_docs/build/holoviews/plotting/plot.py", line 981, in __call__
    subplot(ranges=ranges)
  File "/var/lib/buildbot/slaves/holoviews_docs/build/holoviews/plotting/plot.py", line 666, in __call__
    self.adjust_positions()
  File "/var/lib/buildbot/slaves/holoviews_docs/build/holoviews/plotting/plot.py", line 681, in adjust_positions
    plt.draw()
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/pyplot.py", line 570, in draw
    get_current_fig_manager().canvas.draw()
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/backends/backend_agg.py", line 461, in draw
    self.figure.draw(self.renderer)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/artist.py", line 59, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/figure.py", line 1079, in draw
    func(*args)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/artist.py", line 59, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/axes/_base.py", line 2092, in draw
    a.draw(renderer)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/artist.py", line 59, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/image.py", line 369, in draw
    im = self.make_image(renderer.get_image_magnification())
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/image.py", line 593, in make_image
    transformed_viewLim)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/image.py", line 208, in _get_unsampled_image
    x = self.to_rgba(self._A, bytes=False)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/cm.py", line 262, in to_rgba
    x = self.norm(x)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/colors.py", line 916, in __call__
    vmin = float(vmin)
TypeError: float() argument must be a string or a number

Out[16]:
:Layout
   .Passengers.Airline.I  :TimeSeries   [Year,Month]   (Passengers)
   .Passengers.Airline.II :HeatMap   [Year,Month]   (Passengers)

Tipping data

A simple regression can easily be visualized using the Regression Element type. However, here we'll also split out smoker and sex as Dimensions, overlaying the former and laying out the latter, so that we can compare tipping between smokers and non-smokers, separately for males and females.

In [17]:
%%opts Regression [apply_databounds=True]
tips.regression('total_bill', 'tip', mdims=['smoker','sex'],
                extents=(0, 0, 50, 10), reduce_fn=np.mean).overlay('smoker').layout('sex')
Out[17]:

When you're dealing with higher dimensional data you can also work with pandas dataframes directly by displaying the DFrame Element directly. This allows you to perform all the standard HoloViews operations on more complex Seaborn and pandas plot types, as explained in the following sections.

Iris Data

Let's visualize the relationship between sepal length and width in the Iris flower dataset. Here we can make use of some of the inbuilt Seaborn plot types, a pairplot which can plot each variable in a dataset against each other variable. We can customize this plot further by passing arguments via the style options, to define what plot types the pairplot will use and define the dimension to which we will apply the hue option.

In [18]:
%%opts DFrame (diag_kind='kde' kind='reg' hue='species')
iris.clone(label="Iris Data", plot_type='pairplot')
Out[18]:

When working with a DFrame object directly, you can select particular columns of your DFrame to visualize by supplying x and y parameters corresponding to the Dimensions or columns you want visualize. Here we'll visualize the sepal_width and sepal_length by species as a box plot and violin plot, respectively.

In [19]:
%%opts DFrame [show_grid=False]
iris.clone(x='species', y='sepal_width', plot_type='boxplot') + iris.clone(x='species', y='sepal_length', plot_type='violinplot')
Out[19]:

Titanic passenger data

The Titanic passenger data is a truly large dataset, so we can make use of some of the more advanced features of Seaborn and pandas. Above we saw the usage of a pairgrid, which allows you to quickly compare each variable in your dataset. HoloViews also support Seaborn based FacetGrids. The FacetGrid specification is simply passed via the style options, where the map keyword should be supplied as a tuple of the plotting function to use and the Dimensions to place on the x axis and y axis. You may also specify the Dimensions to lay out along the rows and columns of the plot, and the hue groups:

In [20]:
%%opts DFrame (map=('barplot', 'alive', 'age') col='class' row='sex' hue='pclass' aspect=1.0)
titanic.clone(plot_type='facetgrid')
Out[20]:

FacetGrids support most Seaborn and matplotlib plot types:

In [21]:
%%opts DFrame (map=('regplot', 'age', 'fare') col='class' hue='class')
titanic.clone(plot_type='facetgrid')
Out[21]:

Finally, we can summarize our data using a correlation plot and split out Dimensions using the .holomap method, which groups by the specified dimension, giving you a frame for each value along that Dimension. Here we group by the survived Dimension (with 1 if the passenger survived and 0 otherwise), which thus provides a widget to allow us to compare those two values.

In [22]:
%%output holomap='widgets' size=200
titanic.clone(titanic.data.dropna(), plot_type='corrplot').holomap(['survived'])
Traceback (most recent call last):
  File "/var/lib/buildbot/slaves/holoviews_docs/build/holoviews/ipython/display_hooks.py", line 186, in wrapped
    **kwargs)
  File "/var/lib/buildbot/slaves/holoviews_docs/build/holoviews/ipython/display_hooks.py", line 237, in map_display
    return display_widgets(mapplot)
  File "/var/lib/buildbot/slaves/holoviews_docs/build/holoviews/ipython/display_hooks.py", line 132, in display_widgets
    return SelectionWidget(plot)()
  File "/var/lib/buildbot/slaves/holoviews_docs/build/holoviews/ipython/widgets.py", line 532, in __init__
    for idx, k in enumerate(self.keys))
  File "/usr/lib/python2.7/collections.py", line 52, in __init__
    self.__update(*args, **kwds)
  File "/usr/lib/python2.7/_abcoll.py", line 547, in update
    for key, value in other:
  File "/var/lib/buildbot/slaves/holoviews_docs/build/holoviews/ipython/widgets.py", line 532, in <genexpr>
    for idx, k in enumerate(self.keys))
  File "/var/lib/buildbot/slaves/holoviews_docs/build/holoviews/ipython/widgets.py", line 275, in _plot_figure
    fig = self.plot[idx]
  File "/var/lib/buildbot/slaves/holoviews_docs/build/holoviews/plotting/plot.py", line 274, in __getitem__
    self.update_frame(self.keys[frame])
  File "/var/lib/buildbot/slaves/holoviews_docs/build/holoviews/plotting/seaborn.py", line 283, in update_frame
    self._finalize_axis(key, **(axis_kwargs if axis_kwargs else {}))
  File "/var/lib/buildbot/slaves/holoviews_docs/build/holoviews/plotting/element.py", line 364, in _finalize_axis
    return super(ElementPlot, self)._finalize_axis(key)
  File "/var/lib/buildbot/slaves/holoviews_docs/build/holoviews/plotting/plot.py", line 261, in _finalize_axis
    plt.draw()
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/pyplot.py", line 570, in draw
    get_current_fig_manager().canvas.draw()
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/backends/backend_agg.py", line 461, in draw
    self.figure.draw(self.renderer)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/artist.py", line 59, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/figure.py", line 1079, in draw
    func(*args)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/artist.py", line 59, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/axes/_base.py", line 2092, in draw
    a.draw(renderer)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/artist.py", line 59, in draw_wrapper
    draw(artist, renderer, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/image.py", line 369, in draw
    im = self.make_image(renderer.get_image_magnification())
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/image.py", line 593, in make_image
    transformed_viewLim)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/image.py", line 208, in _get_unsampled_image
    x = self.to_rgba(self._A, bytes=False)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/cm.py", line 262, in to_rgba
    x = self.norm(x)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/colors.py", line 916, in __call__
    vmin = float(vmin)
TypeError: float() argument must be a string or a number

Out[22]:
:HoloMap   [survived]
   :DFrame   [fare,adult_male,embarked,deck,age,who,parch,pclass,sex,embark_town,alive,alone,sibsp,class]

As you can see, the Seaborn plot types and pandas interface provide substantial additional capabilities to HoloViews, while HoloViews allows simple animation, combinations of plots, and visualization across parameter spaces. Note that the DFrame Element is still available even if Seaborn is not installed, but it will use the standard HoloViews visualizations rather than Seaborn in that case.